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AUTOMATIC LEARNING OF BELIEF FUNCTIONS 

BACKGROUND OF THE INVENTION 

(a) Field of the Invention 

The present invention relates generally to belief functions and, 
more specifically, to a method for automatically learning belief functions. 

(b) Description of Related Art 

A system may have multiple information sources which are used to 
make a decision. In a target recognition situation, the information source 
may take the form of a radar sensor/detector. For example, three different 
sensors may be used when attempting to distinguish targets from decoys. 
A complication arises when two of the sensors report that an object under 
surveillance is a target, and the third sensor reports that the object is a 
decoy. This complication must be resolved to accurately recognize the 
object. 

The Dempster-Shafer theory of evidential reasoning, which is known 
to those skilled in the art, provides means of combining information from 
different, and possibly contradictory information sources. The 
Dempster-Shafer theory uses explicit representations of ignorance and 
conflict to avoid the shortcomings of classical Bayesian probability calculus. 
Dempster-Shafer theory uses belief functions (also called basic probability 
assignments or bpa's), which are generalizations of discrete probability 
functions used in Bayesian probability calculus. In Dempster-Shafer theory. 
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bpa's represent the distribution of probability mass in a system (i.e, how 
strongly something is believed, based on the information that has been 
provided). Referring back to the target recognition problem, an example bpa 
for the sensor information available may be /^({target}) =0.55, ju, ({target, 
5 decoy}) =0.45. This bpa represents the fact that 55% of the evidence from 
a set of sensors considered supports the conclusion that the observed object 
is a target, the remaining 45% remains uncommitted between the target and 
the decoy. Multiple sets of sensors may be used to measure various 
characteristics of an object. For example, the bpa ^ may be based on 
10 sensors that determine the shape of the object being monitored. A second 

D 

^ set of sensors used to produce fj 2 may be based on object size, while a third 

* bpa /> 3 may be based on sensors that monitor the heat associated with the 

m object. Each set of sensors is used to determine the identity of the object 

being observed by using different characteristics of the object. Each bpa 
01 15 represents a probability distribution as to the certainty of the identity of an 

M= object. Sets containing more than a single element (in this example, target 

HI and decoy) are used to represent ambiguity or confusion. Empty sets are 

;jf used to represent conflict or disagreement of evidence. Belief functions may 

M be combined to provide information for further conclusions. For example, 

20 bpa's generated based on size, shape, and heat may be combined to reach 
a decision on the identity of the object under surveillance. 

Previous applications of Dempster-Shafer theory include expert 
systems, accounting systems, and sensor fusion. Despite previous 
applications and the utility of Dempster-Shafer theory, there is no automatic 
25 method for adjusting belief functions in a system. The ability to adjust the 
belief functions used in a system would allow the system to "learn" from the 
information provided by information sources. The ability of a system to 
automatically update belief functions would, in addition to improving the 
performance of the system, allows a system to determine erroneous 
30 information sources, inappropriate information combinations, and optimal 
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information granularities. Therefore, there exists the need for a method of 
automatically updating belief functions. 

SUMMARY OF THE INVENTION 

' The present invention provides a method for automatically learning 

5 belief functions, thus providing the ability to determine erroneous information 
sources, inappropriate information combinations, and optimal information 
granularities, along with enhanced system performance. The present 
invention may be embodied in a method of training belief functions, including 
the steps of gathering information representative of an object or event; 

10 creating a set of basic probability assignments based on said set of 
information; creating combinations of said basic probability assignments; 
measuring an error present in said basic probability assignments and said 
combinations of basic probability assignments; calculating updates of said 
basic probability assignments and said combinations of basic probability 

1 5 assignments based on said error; and modifying said basic probability 
assignments and said combinations of basic probability assignments with 
said updates. 

The invention itself, together with further objects and attendant 
advantages, will best be understood by reference to the following detailed 
20 description, taken in conjunction with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a diagram representing a hardware configuration that may be 
used with the present invention. 

FIG. 2 is a flow diagram representing the method of the present 
25 invention. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
The ability to adjust, or train, belief functions based on information 
provided by information sources would be very useful. The ability of a 
system to automatically train belief functions would, in addition to improving 
30 the performance of the system, allow a system to determine erroneous 
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information sources, inappropriate information combinations, and optimal 
information granularities. 

Belief training includes both supervised and unsupervised learning. 
Supervised learning takes place when a desired bpa is known and an 
5 observed bpa is available. Unsupervised learning takes place when the 
desired bpa is not explicitly known, but some measurable quality or 
characteristic of a good bpa is known. Both supervised and unsupervised 
learning employ the same general method of learning. That is, each method 
of learning generates an error term based on observed bpa's and processes 
10 that error term to generate updates to the belief functions used by the 
system. 

Referring now to FIG. 1, an information system 10 is shown. The 
information system 10 includes a number of information sources 20 and a 
signal processing installation 30. The information sources 20 may take a 

1 5 wide variety of forms including sensors capable of sensing an object or 
event and reporting information to the signal processing installation 30. 
Alternatively, the information sources 20 may be rules or opinions gathered 
from individuals, typically experts. The outputs of the information sources 
20 are signals, which represent the event being observed. The outputs of 

20 the information sources 20 are coupled to the signal processing installation 
30, which generates bpa's based on provided information and executes 
software implementing the method of the present invention. 

FIG. 2 is a flow diagram of a method embodying the present 
invention. The flow diagram is generalized to apply to both supervised and 

25 unsupervised learning of belief functions. Any differences in the 
implementation method for supervised and unsupervised learning will be 
noted with respect to each step of the flow diagram. The method as 
described is executed by the signal processing installation 30, which may be 
implemented as a traditional computer or workstation terminal. 

30 As shown in FIG. 2, at block 100 the method polls the information 

sources 20 to extract information. The extracted information will be used 
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to generate a belief function, or bpa. The output of each information source 
20 is representative of an observation, a rule, an opinion, or some other 
measurable phenomenon. The polling of the information source 20 is no 
different for supervised or unsupervised learning methods.. Block 110 
5 performs the function of gathering the information reported by the 
information sources 20, processing the information into bpa's, and 
combining the sensor bpa's in a predetermined fashion. For example, the 
bpa /J, may be based on object shape. A second set of information sources 
20, used to produce jj 2 may be based on object size, while a third bpa jj 3 

10 may be based on the heat associated with the object. By combining the 
three bpa's 0t/ 1f jj 2 , /j 3 ) via Dempster's rule of combination, which is well 
known in the art, a fourth bpa (jj 0 ) is created. This new bpa provides more 
information as to the identity of object being observed. 

Block 1 20 then measures the error present in the bpa's based on the 

1 5 information from the information sources 20. The goal of both supervised 
and unsupervised learning is to minimize error in the bpa's. The calculation 
of error is performed differently for unsupervised and supervised learning 
applications. Additionally, the calculation of error is application dependent. 
That is, there are numerous ways to express error terms other than the ways 

20 shown below. In the case of supervised learning, where a desired bpa is 
known, the error term may consist of the observed results from the 
information sources 20 being subtracted from the desired results. This may 

be represented by e= ( \x d -\x 0 ) 2 » where E is the error term, // d is the desired 

bpa, and jj 0 is the combined bpa based on the information from the 
25 information sources 20. 

Unsupervised learning relies on the qualities or characteristics of good 
bpa's, and not on known bpa's like supervised learning. Block 120 can 
calculate the error in the information through the use of various functions. 
Two examples of such functions are shown below in equations (1 ) and (2). 
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Note that error calculations are application dependent and many other 
implementations of error calculations may be used. 

In equation (1 ), jj is the result of combining the bpa's of interest. The value 
5 of E' is minimum when all mass is devoted to one set containing a single 
element. For example, in equation (1 ) //(a) may be a target and ju(b) may be 
a decoy, and both the target and the decoy are contained in set W. 
Accordingly, equation (2) below recites, 

\': ] < { =[1-E g(a)] 2 +E a(a) [l-qr(a)] (2) 

36nf cJ fc Mr 

10 In equation (2), E" is a minimum when all mass is devoted to a single 
element. The q(a) term is the commonality function of interest. For 
example, if three information sources produce three bpa's which are 
converted to commonality functions ql, q2, and q3, then q=q1q2q3. 

After the error has been calculated, block 1 30 calculates the updates 

1 5 that need to be made to each belief function. The updates are based on the 
fact that minimal error is desired. The calculation of the updates may be 
made using partial differentiation with respect to the bpa being updated. For 
example, to train /y, an update can be calculated using equation (3), which 
is commonly known as the gradient-descent rule. 

"\-n 20 A ^i = -# (3) 

In equation (3), E is the error term calculated using either unsupervised or 
supervised techniques. The calculation of error, partial derivatives, and 
updates are very application dependent and are not limited to the equations 
disclosed herein. By carrying out the partial differentiation shown in 
25 equation (3), the error term, which is composed of multiple bpa's (e.g., fj y , 
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jj 2i /v 3 ), is differentiated with respect to one of the original bpa's (e.g., fj^) 
yielding the update that needs to be made to fj, to minimize the error term 
(E). 

After the bpa updates have been calculated, block 140 modifies the 
5 belief functions by adding the updates to the bpa's, and passing program 
control to block 100, which starts the learning process again. 

Of course, it should be understood that a range of changes and 
modifications can be made to the preferred embodiment described above. 
For example, information sources in the system may be sensors or 
10 information such as rules or opinions, which may be gathered from experts. 
It is therefore intended that the foregoing detailed description be regarded 
as illustrative rather than limiting and that it be understood that it is the 
following claims, including all equivalents, which are intended to define the 
scope of this invention. 
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